-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: repro: add --pull #1841
docs: repro: add --pull #1841
Conversation
@@ -154,6 +155,9 @@ up-to-date and only execute the final stage. | |||
corresponding pipelines, including the target stages themselves. This option | |||
has no effect if `targets` are not provided. | |||
|
|||
- `--pull` - try automatically pulling cached outputs if they are not present in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, a few questions:
try
- what happens if it fails?pulling
- make it a link to thedvc pull
probablycached outputs
- here not sure if it's better to use DVC-tracked outputs. (otherwise when you read it is bit hard mentally since you are they cached but not present in cache).
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- as it was before, it will simply not restore from run-cache.
Addressed 2 and 3. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, got it. I guess it's fine for now. No reason to further improve this since we don't have run-cache documented anywhere. So we can keep as is -an advanced option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have run-cache documented anywhere
BTW the run-cache is already mentioned in 6 cmd refs (published) and in the Data Pipelines page of the GS, which I just noticed/realized just now. I thought we were not going to include any info about experiments until it's more stable? Should we remove these mentions or prioritize documenting run-cache? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgeorpinel Let's not remove it. It is in a semi-official state, people already use it due to cml and other sources. We are on our way to cleaning up the ui overall and publishing experiments.
Run-cache doc by itself doesn't really mean anything to the users, which is why I didn't write it in the summer. It only makes sense in particular commands, so the doc about run-cache internals could wait for the high-level commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not remove remove it. It is in a semi-official state
OK, I agree it's best too keep, but it could be problematic that the run-cache mentions are completely out of context (no explanation of the concept anywhere).
Run-cache doc by itself doesn't really mean anything to the users... the doc about run-cache internals could wait...
Much disagree 🐶 I mean it's not so important whether it's a stand-alone doc or a new section in existing doc(s), but the basics about run-cache seem like a quite important thing to document to me.
It only makes sense in particular commands
Yeah anywhere we want to put it as long as it's published would be great since this is already semi-official.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgeorpinel Agreed, I've added the run-cache doc ticket to next sprint, just preliminarily. Thanks 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thank! 🦊
- `--pull` - try automatically [pulling](/doc/command-reference/pull) missing | ||
cache for outputs restored from run-cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back on this. Per iterative/dvc#4538 (comment):
dvc repro --pull pulls regular files, hashes for which might've been restored from the existing run-cache, so kinda like regular dvc pull
Unfortunately I don't understand either one of the explanations. What's the relationship between run-cache and repro --pull
? Maybe a step-by-step explanation like 1. Use repro --pull
; 2. run-cache is checked before executing commands (default repro behavior I think); 3. Some output hashes are found? (but not the actual files? This is the confusing part); 4. Hashes are looked for in the cache but not found; 5. The files are looked for in remote storage. Something like that
Please @efiop ! Thanks in advance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgeorpinel Even if we leave the run-cache out, repro --pull
would still try to dvc pull
outputs that are missing, but the pipeline didn't change. E.g. when you forgot to dvc pull
beforehand and you are trying to dvc repro
otherwise up-to-date pipeline, so dvc repro --pull
will just pull the outputs for such stages instead of trying to reproduce them.
Run-cache is then just a special source of lock files, and repro --pull
works the same way as explained above.
Want to point out again that --pull
is still a temporary solution that was needed to improve pull --run-cache
that is also not complete in a product sense. So I would recommend not spending much time on this, as the product scenario is WIP and there is no reason to optimize the docs for it too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK it makes more sense now, thanks.
In this case I do feel like need to spend enough time understanding what's going on so that when the coming bulk of docs related to new features hit, I'm better prepared. So thanks again for baring with me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last Q @efiop. Does this only check the default remote (if one is set)? Or all remotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, 2 more questions...
- Does it check only the local run-cache? Or also the remote run-cache for possible dep/out hashes?
- What happens if you do
repro --pull --no-run-cache
? Is the run-cache check skipped?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this only check the default remote (if one is set)? Or all remotes?
Only the default remote right now.
Does it check only the local run-cache? Or also the remote run-cache for possible dep/out hashes?
Yes, only local run-cache.
What happens if you do repro --pull --no-run-cache? Is the run-cache check skipped?
Correct. It will only pull if you have your lock file complete (so hashes are already there, just the outputs are missing from cache), but won't try to use run-cache.
Please feel free to ask any questions, I do understand that this incomplete feature is a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's all I can think of for now. Thanks @efiop! Updated in https://github.com/iterative/dvc.org/pull/1881/files#diff-6c1f3192f09e2722ba169e9fa219b3b5158bbafa470b382c2d6135db7aa1e20d.
Per iterative/dvc#4538
❗ Please read the guidelines in the Contributing to the Documentation list if you make any substantial changes to the documentation or JS engine.
🐛 Please make sure to mention
Fix #issue
(if applicable) in the description of the PR. This causes GitHub to close it automatically when the PR is merged.Please choose to allow us to edit your branch when creating the PR.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏